Abschlussprojekt - Data Analytics¶

Das Abschlussprojekt besteht aus zwei Teilen:

                                            1. Teil - Datenanalyse
                                            2. Teil - Dashboard

1. Teil - Datenanalyse¶

Du arbeitest als Data Scientist bei der P-2-P-Plattform https://www.kiva.org/, die vor einem Jahr gegründet wurde. Nun wollt ihr euer Geschäft erweitern. Euer Team hat sich aufgeteilt und jeder Analyst hat einen Teilbereich der Daten. Deine Aufgabe ist es in einer explorativen Datenanalyse Insights für eure Plattform herauszufinden.

Anforderungen:

            - vollständige Datenanalyse
            - Dokumentation der Arbeitsschritte
            - nachvollziehbare Erläuterungen über Vorgehensweise/Entscheidungen
            - 3 verschiedenartige Plots
            - Customizing der Plots
            - Analyse der Plots
            - Coding

Die Dokumentation kann in englisch oder deutsch erfolgen.

Treff ist Donnerstag um 14:30 Uhr im Hauptraum. Da werden wir die Unterlagen in dokumentensichere Formate umwandeln. Das machen wir zusammen. Danach habt ihr bis 15:35 Uhr zeit, eure zu bewertenden Unterlagen in den Projektabgabeordner hochzuladen.

Am Freitag erfolgt die Vorstellung der Plots innerhalb des Zeitrahmens von 5min - (+-1min) ist dabei ok.

CRISP DM: Business Understanding¶

Euer Geschäftsmodell ist das Betreiben einer Plattform (crowd-investing) bei der sich Personen die eine Geschäftsidee haben, aber nicht das benötigte Geld, anmelden und für ihr Projekt innerhalb einer vorgegebenen Zeit Geld sammeln können.

Auf der anderen Seite habt ihr Geldgeber, die gern ihr Geld in Projekte anlegen möchten und nach Investitionen suchen.

Als Vermittler bringt eure Plattform also Geldnehmer und Geldgeber zusammen. Ihr verdient euer Geld mit einer Provision für jedes Projekt was auf eurer Plattform landet. Der Geldgeber erhält einen Zins für die Geldleihe.

Deine Datenbasis ist die Historie eurer Plattform.

Alle Projekte sind abgeschlossene Projekte, d.h. die Zeit, um für sein Projekt Geld zu sammeln ist abgelaufen. Euer Geschäftsmodell sieht es vor, dass die gesammelten Gelder ausgezahlt werden, auch wenn der Zielbetrag nicht erreicht wurde.

CRISP DM: Data Understanding¶

- funded_amount ... mit Ablauf der "Crowding"zeit erhaltener Betrag/ ausgezahlter Betrag in USD
- loan_amount ... Zielbetrag (Betrag dem man für das Projekt erreichen wollte) in USD
- activity ... Unterkategory zu dem das Ziel des Crowdprojektes thematisch gehört
- sector ... Oberkategory in den das Crowdprojektes Thema fällt
- use ... Kurzbeschreibung wofür das Geld verwendet werden soll
- country_code ... Ländercode nach ISO Norm
- country ... Ländername nach ISO Norm
- region ... Region
- currency ... Währung in den der funded_amount dann ausgezahlt wurde
- term in months ... Dauer über die der Kredit ausgezahlt werden soll
- lender_count ...Darlehensgeber (also wieviele Personen Geld für das Projekt gegeben haben)
- borrower_genders ... Geschlecht und Anzahl der Darlehensnehmer, also diejenigen die das Crowdprojekt initiiert haben       
- repayment interval ... vertraglich vereinbarte Rückzahlungsmodalitäten/-rhythmus

CRISP DM: Data Preparation¶

In this section i will go ahead to prepare my data for Data analysis. This section will be furher divided into the following subsections .

  1. Cleaning my Data
    • Overview of my data and its columns data types
    • Are there any missen values,special characters or null values?
    • Are the duplicate rows?
    • How can missen values,and duplicate rows be solved?
    • Are there columns we do not need ?
  1. Creating Features
    • Can a meaningful correlation be found between columns?
    • Can new columns be created to aid with analysis?
  1. Optimization and Storage
    • Changing the data type of the columns in my data set to optimize storage
    • Saving my cleaned data set for analysis

Cleaning my Data¶

In [1]:
# Importing the necessary libraries

import pandas as pd
import numpy as np
In [2]:
#Reading my file

# I got a tokenizing error. I opened my file in excel to see what could be the problem.
#I discovered it had a # seperator 

df = pd.read_csv("data_abschlussprojekt.csv", sep="#")
df
Out[2]:
Unnamed: 0 funded_amount loan_amount activity sector use country_code country region currency term_in_months lender_count borrower_genders repayment_interval
0 0 300.0 300.0 Fruits & Vegetables Food To buy seasonal, fresh fruits to sell. PK Pakistan Lahore PKR 12.0 12 female irregular
1 1 575.0 575.0 Rickshaw Transportation to repair and maintain the auto rickshaw used ... PK Pakistan Lahore PKR 11.0 14 female, female irregular
2 2 150.0 150.0 Transportation Transportation To repair their old cycle-van and buy another ... IN India Maynaguri INR 43.0 6 female bullet
3 3 200.0 200.0 Embroidery Arts to purchase an embroidery machine and a variet... PK Pakistan Lahore PKR 11.0 8 female irregular
4 4 400.0 400.0 Milk Sales Food to purchase one buffalo. PK Pakistan Abdul Hakeem PKR 14.0 16 female monthly
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
671200 671200 0.0 25.0 Livestock Agriculture [True, u'para compara: cemento, arenya y ladri... PY Paraguay Concepción USD 13.0 0 female monthly
671201 671201 25.0 25.0 Livestock Agriculture [True, u'to start a turducken farm.'] - this l... KE Kenya NaN KES 13.0 1 female monthly
671202 671202 0.0 25.0 Games Entertainment NaN KE Kenya NaN KES 13.0 0 NaN monthly
671203 671203 0.0 25.0 Livestock Agriculture [True, u'to start a turducken farm.'] - this l... KE Kenya NaN KES 13.0 0 female monthly
671204 671204 0.0 25.0 Livestock Agriculture [True, u'to start a turducken farm.'] - this l... KE Kenya NaN KES 13.0 0 female monthly

671205 rows × 14 columns

Viewing my columns

In [3]:
# Finding out my column names

df.columns
Out[3]:
Index(['Unnamed: 0', ' funded_amount', ' loan_amount', ' activity', ' sector',
       ' use', ' country_code', ' country', ' region', ' currency',
       ' term_in_months', ' lender_count', ' borrower_genders',
       ' repayment_interval'],
      dtype='object')
In [4]:
# I discover that my column names have trailing spaces at the end. 
#I adjust for easy use 

df.columns = df.columns.str.replace(' ', '')
df.columns
Out[4]:
Index(['Unnamed:0', 'funded_amount', 'loan_amount', 'activity', 'sector',
       'use', 'country_code', 'country', 'region', 'currency',
       'term_in_months', 'lender_count', 'borrower_genders',
       'repayment_interval'],
      dtype='object')
In [5]:
# Let's drop the Unamed column
df = df.drop('Unnamed:0',axis=1)

Duplicate Rows

In [6]:
#Check if there are any duplicate rows

# I will find duplictaes through out the whole table


df.duplicated().sum()
Out[6]:
24372
In [7]:
df.loc[df.duplicated()==True,:]
Out[7]:
funded_amount loan_amount activity sector use country_code country region currency term_in_months lender_count borrower_genders repayment_interval
498 100.0 100.0 Home Energy Personal Use to buy a solar-powered lamp. SV El Salvador NaN USD 14.0 4 male monthly
606 100.0 100.0 Home Energy Personal Use to buy a solar-powered lamp. SV El Salvador NaN USD 14.0 4 male monthly
808 450.0 450.0 Higher education costs Education to pay for one semester's registration fees. CO Colombia Bogotà COP 7.0 15 female monthly
1703 500.0 500.0 Higher education costs Education To buy a laptop for educational purposes. SO Somalia Hargeisa USD 8.0 19 male monthly
2317 250.0 250.0 Poultry Agriculture to purchase poultry. KE Kenya Ndaragwa KES 16.0 10 female monthly
... ... ... ... ... ... ... ... ... ... ... ... ... ...
671200 0.0 25.0 Livestock Agriculture [True, u'para compara: cemento, arenya y ladri... PY Paraguay Concepción USD 13.0 0 female monthly
671201 25.0 25.0 Livestock Agriculture [True, u'to start a turducken farm.'] - this l... KE Kenya NaN KES 13.0 1 female monthly
671202 0.0 25.0 Games Entertainment NaN KE Kenya NaN KES 13.0 0 NaN monthly
671203 0.0 25.0 Livestock Agriculture [True, u'to start a turducken farm.'] - this l... KE Kenya NaN KES 13.0 0 female monthly
671204 0.0 25.0 Livestock Agriculture [True, u'to start a turducken farm.'] - this l... KE Kenya NaN KES 13.0 0 female monthly

24372 rows × 13 columns

How to deal with my duplicate rows

  • Firstly in this context what does a duplicate mean to me?

Because we do not have columns such as date , project Id and personal identification details of the borrowers, i will consider that any rows that has th eexact same contents in the columns , and especially in the 'use' column as a duplicate.

  • What do i do with these columns?

I will proceed to delete these columns and i have considered them as a double entry.

In [8]:
#Deleting my duplicated rows and proofing

df.drop_duplicates(inplace=True)
df.duplicated().sum()
Out[8]:
0

Column Data Types

In [9]:
df.dtypes
Out[9]:
funded_amount         float64
loan_amount           float64
activity               object
sector                 object
use                    object
country_code           object
country                object
region                 object
currency               object
term_in_months        float64
lender_count            int64
borrower_genders       object
repayment_interval     object
dtype: object

Null Values

In [10]:
#Checking for columns with null values
df.isnull().sum()
Out[10]:
funded_amount             0
loan_amount               0
activity                  0
sector                    0
use                    3899
country_code              8
country                   0
region                56163
currency                  0
term_in_months            0
lender_count              0
borrower_genders       3888
repayment_interval        0
dtype: int64
In [11]:
df.isnull().sum() * 100 / len(df)
Out[11]:
funded_amount         0.000000
loan_amount           0.000000
activity              0.000000
sector                0.000000
use                   0.602783
country_code          0.001237
country               0.000000
region                8.682767
currency              0.000000
term_in_months        0.000000
lender_count          0.000000
borrower_genders      0.601083
repayment_interval    0.000000
dtype: float64

Indepth analysis of columns with null values

Use Column

In [12]:
#Overview of use column with null values

df.loc[(df.use.isnull()),:]
Out[12]:
funded_amount loan_amount activity sector use country_code country region currency term_in_months lender_count borrower_genders repayment_interval
140 2975.0 2975.0 Food Production/Sales Food NaN TZ Tanzania NaN TZS 10.0 110 NaN monthly
145 1200.0 1200.0 Personal Expenses Personal Use NaN PE Peru NaN PEN 20.0 44 NaN monthly
170 4250.0 4250.0 Catering Food NaN TZ Tanzania NaN TZS 10.0 116 NaN monthly
412 2350.0 2350.0 Beauty Salon Services NaN TZ Tanzania NaN TZS 10.0 75 NaN monthly
414 725.0 725.0 Agriculture Agriculture NaN SV El Salvador NaN USD 20.0 19 NaN monthly
... ... ... ... ... ... ... ... ... ... ... ... ... ...
659604 5625.0 10000.0 Weaving Arts NaN BT Bhutan NaN USD 14.0 210 NaN irregular
660788 1975.0 1975.0 Home Energy Personal Use NaN PS Palestine NaN USD 27.0 39 NaN monthly
661718 800.0 1600.0 Furniture Making Manufacturing NaN HT Haiti NaN HTG 13.0 27 NaN irregular
671151 0.0 25.0 Livestock Agriculture NaN KE Kenya NaN KES 13.0 0 NaN monthly
671174 0.0 25.0 Games Entertainment NaN KE Kenya NaN KES 13.0 0 NaN monthly

3899 rows × 13 columns

Gender Column

In [13]:
df.loc[(df.borrower_genders.isnull()),:]
Out[13]:
funded_amount loan_amount activity sector use country_code country region currency term_in_months lender_count borrower_genders repayment_interval
140 2975.0 2975.0 Food Production/Sales Food NaN TZ Tanzania NaN TZS 10.0 110 NaN monthly
145 1200.0 1200.0 Personal Expenses Personal Use NaN PE Peru NaN PEN 20.0 44 NaN monthly
170 4250.0 4250.0 Catering Food NaN TZ Tanzania NaN TZS 10.0 116 NaN monthly
412 2350.0 2350.0 Beauty Salon Services NaN TZ Tanzania NaN TZS 10.0 75 NaN monthly
414 725.0 725.0 Agriculture Agriculture NaN SV El Salvador NaN USD 20.0 19 NaN monthly
... ... ... ... ... ... ... ... ... ... ... ... ... ...
659604 5625.0 10000.0 Weaving Arts NaN BT Bhutan NaN USD 14.0 210 NaN irregular
660788 1975.0 1975.0 Home Energy Personal Use NaN PS Palestine NaN USD 27.0 39 NaN monthly
661718 800.0 1600.0 Furniture Making Manufacturing NaN HT Haiti NaN HTG 13.0 27 NaN irregular
671151 0.0 25.0 Livestock Agriculture NaN KE Kenya NaN KES 13.0 0 NaN monthly
671174 0.0 25.0 Games Entertainment NaN KE Kenya NaN KES 13.0 0 NaN monthly

3888 rows × 13 columns

In [14]:
# Does a null value in Gender column imply a null in Region and Use column?
df.loc[(df.borrower_genders.isnull()) & (df.region.notnull()),:]
Out[14]:
funded_amount loan_amount activity sector use country_code country region currency term_in_months lender_count borrower_genders repayment_interval
In [15]:
df.loc[(df.borrower_genders.isnull()) & (df.use.notnull()),:]
Out[15]:
funded_amount loan_amount activity sector use country_code country region currency term_in_months lender_count borrower_genders repayment_interval
In [16]:
# In how many countries do we have a null value for gender
missing_gender = df.loc[(df.borrower_genders.isnull()), "country"].nunique()
missing_gender
Out[16]:
69
In [17]:
#How many countries are there in our data set

df.country.nunique()
Out[17]:
87

Region Column

In [18]:
df.loc[(df.region.isnull()),:]
Out[18]:
funded_amount loan_amount activity sector use country_code country region currency term_in_months lender_count borrower_genders repayment_interval
5 250.0 250.0 Services Services purchase leather for my business using ksh 20000. KE Kenya NaN KES 4.0 6 female irregular
49 450.0 450.0 General Store Retail to stock his store. SV El Salvador NaN USD 14.0 18 male monthly
54 225.0 225.0 Food Market Food to purchase various seasonal items to resell: ... SN Senegal NaN XOF 14.0 7 female monthly
67 125.0 125.0 Energy Services purchase solar lanterns for resale. KE Kenya NaN KES 3.0 6 male irregular
70 2000.0 2000.0 Retail Retail to install a display window and a sunshade for... IQ Iraq NaN USD 15.0 71 male monthly
... ... ... ... ... ... ... ... ... ... ... ... ... ...
671166 25.0 25.0 Livestock Agriculture [True, u'to start a turducken farm.'] - this l... KE Kenya NaN KES 13.0 1 female monthly
671167 0.0 25.0 Livestock Agriculture Kiva Coordinator fixed issue loan (no longer v... KE Kenya NaN KES 13.0 0 female, female monthly
671168 0.0 25.0 Livestock Agriculture Pretend the issue with loan got addressed by K... KE Kenya NaN KES 13.0 0 female monthly
671171 0.0 25.0 Games Entertainment Kiva Coordinator replaced loan use. Should see... KE Kenya NaN KES 13.0 0 female, female monthly
671174 0.0 25.0 Games Entertainment NaN KE Kenya NaN KES 13.0 0 NaN monthly

56163 rows × 13 columns

In [19]:
#how many countries have a null value for region
missing_region = df.loc[(df.region.isnull()), "country"].nunique()
missing_region
Out[19]:
72

Country code column

In [20]:
df.loc[(df.country_code.isnull()),:]
Out[20]:
funded_amount loan_amount activity sector use country_code country region currency term_in_months lender_count borrower_genders repayment_interval
202537 4150.0 4150.0 Wholesale Wholesale To purchase lighting products for sale to loca... NaN Namibia EEnhana NAD 6.0 162 female bullet
202823 4150.0 4150.0 Wholesale Wholesale To purchase lighting products for sale to loca... NaN Namibia Rundu NAD 6.0 159 male bullet
344929 3325.0 3325.0 Wholesale Wholesale To purchase lighting products for sale to loca... NaN Namibia EEnhana NAD 7.0 120 female bullet
351177 3325.0 3325.0 Wholesale Wholesale To purchase lighting products for sale to loca... NaN Namibia Rundu NAD 7.0 126 male bullet
420953 3325.0 3325.0 Wholesale Wholesale To purchase lighting products for sale to loca... NaN Namibia EEnhana NAD 7.0 118 female bullet
421218 4000.0 4000.0 Wholesale Wholesale purchase solar lighting products for sale to l... NaN Namibia Rundu NAD 7.0 150 male bullet
487207 5100.0 5100.0 Renewable Energy Products Retail to pay for stock of solar lights and cell phon... NaN Namibia Katima Mulilo NAD 7.0 183 male bullet
487653 5000.0 5000.0 Wholesale Wholesale to maintain a stock of solar lights and cell p... NaN Namibia Oshakati NAD 7.0 183 female bullet
In [21]:
#I want to find out if there are other applications from Namibia that have a country code imputed
df.loc[(df.country=="Namibia"),:]

#There are no other rows
Out[21]:
funded_amount loan_amount activity sector use country_code country region currency term_in_months lender_count borrower_genders repayment_interval
202537 4150.0 4150.0 Wholesale Wholesale To purchase lighting products for sale to loca... NaN Namibia EEnhana NAD 6.0 162 female bullet
202823 4150.0 4150.0 Wholesale Wholesale To purchase lighting products for sale to loca... NaN Namibia Rundu NAD 6.0 159 male bullet
344929 3325.0 3325.0 Wholesale Wholesale To purchase lighting products for sale to loca... NaN Namibia EEnhana NAD 7.0 120 female bullet
351177 3325.0 3325.0 Wholesale Wholesale To purchase lighting products for sale to loca... NaN Namibia Rundu NAD 7.0 126 male bullet
420953 3325.0 3325.0 Wholesale Wholesale To purchase lighting products for sale to loca... NaN Namibia EEnhana NAD 7.0 118 female bullet
421218 4000.0 4000.0 Wholesale Wholesale purchase solar lighting products for sale to l... NaN Namibia Rundu NAD 7.0 150 male bullet
487207 5100.0 5100.0 Renewable Energy Products Retail to pay for stock of solar lights and cell phon... NaN Namibia Katima Mulilo NAD 7.0 183 male bullet
487653 5000.0 5000.0 Wholesale Wholesale to maintain a stock of solar lights and cell p... NaN Namibia Oshakati NAD 7.0 183 female bullet

Our findings on Null Columns and the way forward.

From our investigation with the columns that have null values, the following was found:

Use Column

  • The rows with null values in the Use column constitute about 0.6% of our whole data set
  • From a glance it shows that some of the rows that have null values in this column also have null values in the region and gender column. The rest are not related to any other information in the dataset.

Therefore i will assume that the lack of values in this column is as result of negligence. Perharps the applicants forgot to state their specific uses, or the data base administrator ommitted it.

I will not delete these rows because funds were still provided for these applications and these rows may still aid in our analysis.

I will replace the null values with "not_provided"

Gender Column

  • The rows with null values in the Use column constitute about 0.6% of our whole data set
  • From a glance it shows that all the rows that have null values in this column also have null values in the region and use column.
  • Missing values are not specific to countries as almost 80% of countries have applications with missing gender

I will also assume a case of negligence in this column.

Because loans were still provided to applicants with no gender given i will not delete these rows.

However i will replace the null values with the modal value of the gender column.

Region Column

  • The rows with null values in the Use column constitute about 8% of our whole data set
  • Missing values are not specific to countries as almost 82% of countries have applications with a missing region

I will not delete the rows as it constitute a moderate part of our data set. However i will proceed to delete the whole column.

I believe that the country column is necessary enough.

Country_code Column

  • The missen values in this column contain applications from Namibia.

I will not delete these rows as these are the only rows or applications from Namibia. I will proceed to replace the value with "Nam" which is the universal country code for Namimbia. I will therefore conclude that this was an entry error. "Nan" was written instead of "Nam".

Update I later on realised that this data set follows the Aplha two code (NA) instead of the alpha three code (NAM). I later on changed the code from"Nam" to "NA"

1.Replacing null values in Use column with "not_provided"

In [22]:
df["use"].fillna("not_provided", inplace = True)

2.Replace Null values in gender with the mode of the gender column

In [23]:
df['borrower_genders'] = df['borrower_genders'].fillna(df['borrower_genders'].mode()[0])

3.Deleting the Region column

In [24]:
df.drop(columns=['region'],axis=7, inplace=True)

4.Replacing null values in county_code columnn with "Nam"

In [25]:
df["country_code"].fillna("Nam", inplace = True)

# Realised that this data set takes the alpha 2 code instead of the alpha 3 code (ISO norms), so will replace "Nam" with "NA"

df.loc[(df.country_code=="Nam"),"country_code"] ="NA"
In [26]:
#Proofing

df.loc[(df.country=="Namibia"),:]
Out[26]:
funded_amount loan_amount activity sector use country_code country currency term_in_months lender_count borrower_genders repayment_interval
202537 4150.0 4150.0 Wholesale Wholesale To purchase lighting products for sale to loca... NA Namibia NAD 6.0 162 female bullet
202823 4150.0 4150.0 Wholesale Wholesale To purchase lighting products for sale to loca... NA Namibia NAD 6.0 159 male bullet
344929 3325.0 3325.0 Wholesale Wholesale To purchase lighting products for sale to loca... NA Namibia NAD 7.0 120 female bullet
351177 3325.0 3325.0 Wholesale Wholesale To purchase lighting products for sale to loca... NA Namibia NAD 7.0 126 male bullet
420953 3325.0 3325.0 Wholesale Wholesale To purchase lighting products for sale to loca... NA Namibia NAD 7.0 118 female bullet
421218 4000.0 4000.0 Wholesale Wholesale purchase solar lighting products for sale to l... NA Namibia NAD 7.0 150 male bullet
487207 5100.0 5100.0 Renewable Energy Products Retail to pay for stock of solar lights and cell phon... NA Namibia NAD 7.0 183 male bullet
487653 5000.0 5000.0 Wholesale Wholesale to maintain a stock of solar lights and cell p... NA Namibia NAD 7.0 183 female bullet
In [27]:
#Proofing Null values
df.isnull().sum()
Out[27]:
funded_amount         0
loan_amount           0
activity              0
sector                0
use                   0
country_code          0
country               0
currency              0
term_in_months        0
lender_count          0
borrower_genders      0
repayment_interval    0
dtype: int64
In [28]:
#Doing some final checks
df.repayment_interval.unique()
Out[28]:
array(['irregular', 'bullet', 'monthly', 'weekly'], dtype=object)

Creating Features¶

I will be creating some extra columns which i may or may not use for my analyis. However i think it is important to get some more insights from this data set:

  • Term_category Column

I noticed that the duration of payment can range from one month to 158 months. It will be interesting to know if this is dependent on any other factor found in our data set. The website does not specify what the maximum disbursement duration is and if this depends on the sector or the amount of capital asked.

I will be dividing the values of the duration column in four categories: short_term_loan,medium_term_loan,long_term_loan,extra_long_term_loan.The categorization of these loans were done according to finance principles give and take and taken into consideration our Busienss Case.

a. short term 1mo-24mo

b. medium term 25mo-60mo

c. long term 61mo-120mo

d. extra long term 121mo and above

  • Rate of financing

This new column will tell us the rate at which requested amounts (loan_amount) are being financed(funded_amount). To calculte the values of this column i will use teh followingg formular

funded amount/loan amount *100

  • Loan status

Here i will divide the the funded_loan column between loans that were funded (percentage of funding) and not funded. This column will bbe built based on the results of the rate of financing column.

a. Not funded 0

b. Very poor funding 1-30

c. Poor Funding 30-49

d. Avg Funding 50-80

e. Good Funding 81-99

f. Fully Funded 100

1.term_category column

In [29]:
#min and max values for this column
df.term_in_months.max()
Out[29]:
158.0
In [30]:
df.term_in_months.min()
Out[30]:
1.0
In [31]:
df.term_in_months.unique()
Out[31]:
array([ 12.,  11.,  43.,  14.,   4.,  13.,  10.,   8.,   5.,  20.,   7.,
         3.,  17.,  15.,   9.,  39.,  23.,  26.,   6.,  30.,  22.,  25.,
        27.,  16.,  52.,  18.,  36.,  19.,  28.,  62.,  32.,  24.,  21.,
        50.,  38.,   2.,  35.,  72., 137.,   1.,  49.,  33.,  42.,  29.,
        37.,  51., 113.,  79.,  31.,  44.,  74.,  34.,  48., 114.,  40.,
       124., 104.,  63.,  85.,  78.,  70.,  86.,  61.,  60.,  67.,  55.,
        53.,  41.,  68., 143.,  77., 130.,  45., 111., 134., 107., 142.,
       148.,  56., 122., 133., 141., 110.,  81., 106.,  54., 147., 112.,
        59., 145., 121., 109.,  80.,  47.,  97.,  75., 101., 128.,  98.,
        87.,  71.,  66.,  46., 125.,  76.,  73., 120., 144., 118., 131.,
        65., 108.,  58., 123.,  84.,  99.,  82.,  92.,  69.,  91.,  57.,
        90.,  93., 129.,  89.,  88.,  64., 126., 138., 158.,  83., 100.,
       105., 132.,  96., 127., 135.,  95., 154., 156.,  94., 115., 102.,
       116., 136., 103., 139., 146.])
In [32]:
#creating categories for this column
category = ["short_term_loan", "medium_term_loan", "long_term_loan","extra_long_term_loan"]

max_value = df.loc[:,"term_in_months"].max()
min_value = df.loc[:,"term_in_months"].min()

term_range = [min_value - .001 * abs(min_value), 25, 61, 121,max_value + .001 * abs(max_value)]

df["term_cat"] = pd.cut(x=df.loc[:,"term_in_months"],
                                      bins=term_range,          
                                                                 
                                      labels=category            
                                     )

df["term_label"] = pd.cut(x=df.loc[:,"term_in_months"],
                                      bins=term_range,         
                                                                 
                                     )

df.head(2)
Out[32]:
funded_amount loan_amount activity sector use country_code country currency term_in_months lender_count borrower_genders repayment_interval term_cat term_label
0 300.0 300.0 Fruits & Vegetables Food To buy seasonal, fresh fruits to sell. PK Pakistan PKR 12.0 12 female irregular short_term_loan (0.999, 25.0]
1 575.0 575.0 Rickshaw Transportation to repair and maintain the auto rickshaw used ... PK Pakistan PKR 11.0 14 female, female irregular short_term_loan (0.999, 25.0]

2. rate_of_financing(%) column

In [33]:
df["rate_of_financing(%)"]=round((df['funded_amount']/df['loan_amount'])*100,2)
df.head(2)
Out[33]:
funded_amount loan_amount activity sector use country_code country currency term_in_months lender_count borrower_genders repayment_interval term_cat term_label rate_of_financing(%)
0 300.0 300.0 Fruits & Vegetables Food To buy seasonal, fresh fruits to sell. PK Pakistan PKR 12.0 12 female irregular short_term_loan (0.999, 25.0] 100.0
1 575.0 575.0 Rickshaw Transportation to repair and maintain the auto rickshaw used ... PK Pakistan PKR 11.0 14 female, female irregular short_term_loan (0.999, 25.0] 100.0

3.loan_status column

In [34]:
conditions = [
    (df['rate_of_financing(%)'] == 0),
    (df['rate_of_financing(%)'] > 0) & (df['rate_of_financing(%)'] <= 30),
    (df['rate_of_financing(%)'] > 30) & (df['rate_of_financing(%)'] <= 49),
    (df['rate_of_financing(%)'] > 49) & (df['rate_of_financing(%)'] <= 80),
    (df['rate_of_financing(%)'] > 80) & (df['rate_of_financing(%)'] < 100),
    (df['rate_of_financing(%)'] == 100 ),
    (df['rate_of_financing(%)'] > 100 )
    
              ]
choices = ["not_funded","very_poor_funding","poor_funding" ,"avg_funding", "good_funding","fully_funded","over_funded"]
df['loan_status'] = np.select(conditions,choices, default=0)
In [35]:
df.head(2)
Out[35]:
funded_amount loan_amount activity sector use country_code country currency term_in_months lender_count borrower_genders repayment_interval term_cat term_label rate_of_financing(%) loan_status
0 300.0 300.0 Fruits & Vegetables Food To buy seasonal, fresh fruits to sell. PK Pakistan PKR 12.0 12 female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
1 575.0 575.0 Rickshaw Transportation to repair and maintain the auto rickshaw used ... PK Pakistan PKR 11.0 14 female, female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded

Optimization and Storage¶

In this subsection we are going to check our data types again and make sure that they are in a format that can optimize data storage.

In [36]:
df.dtypes
Out[36]:
funded_amount            float64
loan_amount              float64
activity                  object
sector                    object
use                       object
country_code              object
country                   object
currency                  object
term_in_months           float64
lender_count               int64
borrower_genders          object
repayment_interval        object
term_cat                category
term_label              category
rate_of_financing(%)     float64
loan_status               object
dtype: object
In [37]:
#Changing the columns below to category type since the values in this column are repetitive



df_1=df.copy()
list_category = ['activity','sector','country_code','country','repayment_interval','loan_status','currency']

for column in list_category:
    df_1.loc[:,column] = df.loc[:,column].astype('category')
In [38]:
df.term_in_months.unique()
Out[38]:
array([ 12.,  11.,  43.,  14.,   4.,  13.,  10.,   8.,   5.,  20.,   7.,
         3.,  17.,  15.,   9.,  39.,  23.,  26.,   6.,  30.,  22.,  25.,
        27.,  16.,  52.,  18.,  36.,  19.,  28.,  62.,  32.,  24.,  21.,
        50.,  38.,   2.,  35.,  72., 137.,   1.,  49.,  33.,  42.,  29.,
        37.,  51., 113.,  79.,  31.,  44.,  74.,  34.,  48., 114.,  40.,
       124., 104.,  63.,  85.,  78.,  70.,  86.,  61.,  60.,  67.,  55.,
        53.,  41.,  68., 143.,  77., 130.,  45., 111., 134., 107., 142.,
       148.,  56., 122., 133., 141., 110.,  81., 106.,  54., 147., 112.,
        59., 145., 121., 109.,  80.,  47.,  97.,  75., 101., 128.,  98.,
        87.,  71.,  66.,  46., 125.,  76.,  73., 120., 144., 118., 131.,
        65., 108.,  58., 123.,  84.,  99.,  82.,  92.,  69.,  91.,  57.,
        90.,  93., 129.,  89.,  88.,  64., 126., 138., 158.,  83., 100.,
       105., 132.,  96., 127., 135.,  95., 154., 156.,  94., 115., 102.,
       116., 136., 103., 139., 146.])
In [39]:
#the term_in_months column has only whole numbers as values. So i will change it to interger

df_1['term_in_months'] = df_1['term_in_months'].astype(int)
In [40]:
#Proofing
df_1.dtypes
Out[40]:
funded_amount            float64
loan_amount              float64
activity                category
sector                  category
use                       object
country_code            category
country                 category
currency                category
term_in_months             int32
lender_count               int64
borrower_genders          object
repayment_interval      category
term_cat                category
term_label              category
rate_of_financing(%)     float64
loan_status             category
dtype: object

Explorative Data Analysis¶

In this section i will be analysing my dataset first as a whole an dwill then dig further to analyse based on sections depending on what i find in my original analysis. I will be using data aggregation and data viz to aid in my analysis.

How many projects were funded ?¶

In [41]:
df_f  = df.groupby(by='loan_status',as_index=False).agg(count_cat=('loan_status','count')).sort_values(['count_cat'],ascending=False)
df_f
Out[41]:
loan_status count_cat
1 fully_funded 598796
0 avg_funding 17808
6 very_poor_funding 11878
5 poor_funding 9946
2 good_funding 5125
3 not_funded 3278
4 over_funded 2
In [42]:
import plotly.express as px
In [43]:
fig = px.pie(df_f, 
             values='count_cat', 
             names='loan_status',
             title= "Fig 1.0 : Project funding KIVA 2016"
            )

fig.show()

Seeing as a very huge part of the applications were fully funded (92.6%), it means that the Kiva Organization has a good funding rate. It would however be intersting to see why some loans were not funded and if they share anything in common.

In [44]:
df.loc[df["loan_status"]== "not_funded",:]
Out[44]:
funded_amount loan_amount activity sector use country_code country currency term_in_months lender_count borrower_genders repayment_interval term_cat term_label rate_of_financing(%) loan_status
4814 0.0 5000.0 Food Production/Sales Food grow and create new jobs in my company. US United States USD 24.0 0 female bullet short_term_loan (0.999, 25.0] 0.0 not_funded
6784 0.0 10000.0 Cosmetics Sales Retail start a new stream of income by offering retai... US United States USD 36.0 0 male bullet medium_term_loan (25.0, 61.0] 0.0 not_funded
10735 0.0 5000.0 Services Services purchase new equipment and market to more pote... US United States USD 24.0 0 male bullet short_term_loan (0.999, 25.0] 0.0 not_funded
12037 0.0 5000.0 Food Production/Sales Food purchase much needed packaging that is sustain... US United States USD 24.0 0 male bullet short_term_loan (0.999, 25.0] 0.0 not_funded
13542 0.0 1500.0 Crafts Arts increase my inventory of handmade candles, dri... US United States USD 12.0 0 male bullet short_term_loan (0.999, 25.0] 0.0 not_funded
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
671179 0.0 25.0 Livestock Agriculture [True, u'para compara: cemento, arenya y ladri... PY Paraguay USD 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.0 not_funded
671181 0.0 25.0 Livestock Agriculture Reviewed loan use in english. PK Pakistan PKR 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.0 not_funded
671182 0.0 125.0 Livestock Agriculture Pretend the flagged issue was addressed by KC. MX Mexico MXN 13.0 0 female, female monthly short_term_loan (0.999, 25.0] 0.0 not_funded
671184 0.0 875.0 Livestock Agriculture Translated loan use to english. BO Bolivia BOB 13.0 0 female, female monthly short_term_loan (0.999, 25.0] 0.0 not_funded
671188 0.0 250.0 Livestock Agriculture Reviewed loan use in english. GH Ghana GHS 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.0 not_funded

3278 rows × 16 columns

Note

From the above table i see that perharps the use could have a part to play and maybe where the applications came from? I will dig deeper to see if my assumptions are correct.

In [45]:
#wheh a loan is a fake or when the loan was not originally applied for in english does it affect its funding ?
df.loc[df["use"].str.startswith('Pretend') | df["use"].str.startswith('Kiva')|df["use"].str.startswith('Translated')|df["use"].str.startswith('Reviewed') ,:]        
Out[45]:
funded_amount loan_amount activity sector use country_code country currency term_in_months lender_count borrower_genders repayment_interval term_cat term_label rate_of_financing(%) loan_status
657556 0.0 1000.0 Tourism Services Reviewed loan use in english. KH Cambodia KHR 12.0 0 female, female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
657584 0.0 700.0 Tourism Services Reviewed loan use in english. KH Cambodia KHR 9.0 0 female, female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
657717 0.0 1975.0 Tourism Services Reviewed loan use in english. KH Cambodia KHR 25.0 0 female, female, female, female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
658107 0.0 925.0 Tourism Services Reviewed loan use in english. KH Cambodia KHR 11.0 0 female, female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
658437 0.0 2075.0 Tourism Services Reviewed loan use in english. KH Cambodia KHR 26.0 0 female, female, female monthly medium_term_loan (25.0, 61.0] 0.00 not_funded
659111 0.0 1000.0 Tourism Services Reviewed loan use in english. KH Cambodia KHR 20.0 0 female, female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
661362 0.0 625.0 Transportation Transportation Translated loan use to english. PE Peru PEN 8.0 0 male monthly short_term_loan (0.999, 25.0] 0.00 not_funded
661679 0.0 1550.0 Dental Health Translated loan use to english. PE Peru PEN 8.0 0 male monthly short_term_loan (0.999, 25.0] 0.00 not_funded
661941 0.0 925.0 Tourism Services Translated loan use to english. PE Peru PEN 8.0 0 male monthly short_term_loan (0.999, 25.0] 0.00 not_funded
661956 0.0 1550.0 Restaurant Food Translated loan use to english. PE Peru PEN 8.0 0 female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
662377 0.0 6425.0 Sewing Services Translated loan use to english. PE Peru PEN 6.0 0 male, female, male, male, male, male, female, ... irregular short_term_loan (0.999, 25.0] 0.00 not_funded
662552 0.0 6575.0 Grocery Store Food Translated loan use to english. PE Peru PEN 8.0 0 male, male, female, female, female, female, fe... monthly short_term_loan (0.999, 25.0] 0.00 not_funded
662746 0.0 625.0 Cosmetics Sales Retail Translated loan use to english. PE Peru PEN 8.0 0 female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
662793 0.0 625.0 Electronics Sales Retail Translated loan use to english. PE Peru PEN 8.0 0 male monthly short_term_loan (0.999, 25.0] 0.00 not_funded
662846 0.0 1225.0 Tourism Services Translated loan use to english. PE Peru PEN 8.0 0 male monthly short_term_loan (0.999, 25.0] 0.00 not_funded
662849 0.0 775.0 Clothing Sales Clothing Translated loan use to english. PE Peru PEN 8.0 0 female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
663601 0.0 5525.0 Retail Retail Translated loan use to english. PE Peru PEN 7.0 0 male, female, female, female, female, female, ... monthly short_term_loan (0.999, 25.0] 0.00 not_funded
663824 0.0 1075.0 Home Appliances Personal Use Translated loan use to english. PE Peru PEN 6.0 0 female irregular short_term_loan (0.999, 25.0] 0.00 not_funded
663855 0.0 1700.0 Transportation Transportation Translated loan use to english. PE Peru PEN 8.0 0 female, female, female, female, male monthly short_term_loan (0.999, 25.0] 0.00 not_funded
664078 0.0 1250.0 Retail Retail Translated loan use to english. PE Peru PEN 6.0 0 female irregular short_term_loan (0.999, 25.0] 0.00 not_funded
670969 0.0 675.0 Retail Retail Translated loan use to english. CO Colombia COP 26.0 0 female bullet medium_term_loan (25.0, 61.0] 0.00 not_funded
671027 0.0 300.0 Cattle Agriculture Translated loan use to english. SV El Salvador USD 15.0 0 male bullet short_term_loan (0.999, 25.0] 0.00 not_funded
671092 0.0 1000.0 Cattle Agriculture Translated loan use to english. EC Ecuador USD 38.0 0 female monthly medium_term_loan (25.0, 61.0] 0.00 not_funded
671150 0.0 75.0 Livestock Agriculture Pretend the issue with spanish loan was addres... MX Mexico MXN 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
671152 25.0 25.0 Livestock Agriculture Reviewed loan use in english. PK Pakistan PKR 13.0 1 female monthly short_term_loan (0.999, 25.0] 100.00 fully_funded
671153 75.0 125.0 Livestock Agriculture Pretend the flagged issue was addressed by KC. MX Mexico MXN 13.0 3 female, female monthly short_term_loan (0.999, 25.0] 60.00 avg_funding
671154 0.0 725.0 Livestock Agriculture Translated loan use to english. BO Bolivia BOB 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
671155 50.0 875.0 Livestock Agriculture Translated loan use to english. BO Bolivia BOB 13.0 2 female, female monthly short_term_loan (0.999, 25.0] 5.71 very_poor_funding
671156 75.0 250.0 Livestock Agriculture Reviewed loan use in english. GH Ghana GHS 13.0 3 female monthly short_term_loan (0.999, 25.0] 30.00 very_poor_funding
671157 75.0 75.0 Livestock Agriculture Pretend the issue with spanish loan was addres... MX Mexico MXN 13.0 3 female monthly short_term_loan (0.999, 25.0] 100.00 fully_funded
671159 50.0 725.0 Livestock Agriculture Translated loan use to english. BO Bolivia BOB 13.0 2 female monthly short_term_loan (0.999, 25.0] 6.90 very_poor_funding
671167 0.0 25.0 Livestock Agriculture Kiva Coordinator fixed issue loan (no longer v... KE Kenya KES 13.0 0 female, female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
671168 0.0 25.0 Livestock Agriculture Pretend the issue with loan got addressed by K... KE Kenya KES 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
671171 0.0 25.0 Games Entertainment Kiva Coordinator replaced loan use. Should see... KE Kenya KES 13.0 0 female, female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
671181 0.0 25.0 Livestock Agriculture Reviewed loan use in english. PK Pakistan PKR 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
671182 0.0 125.0 Livestock Agriculture Pretend the flagged issue was addressed by KC. MX Mexico MXN 13.0 0 female, female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
671184 0.0 875.0 Livestock Agriculture Translated loan use to english. BO Bolivia BOB 13.0 0 female, female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
671188 0.0 250.0 Livestock Agriculture Reviewed loan use in english. GH Ghana GHS 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
In [46]:
df.loc[(df.use == 'not_provided') & (df.funded_amount== 0),:]
Out[46]:
funded_amount loan_amount activity sector use country_code country currency term_in_months lender_count borrower_genders repayment_interval term_cat term_label rate_of_financing(%) loan_status
36075 0.0 600.0 Personal Medical Expenses Health not_provided BO Bolivia USD 20.0 0 female monthly short_term_loan (0.999, 25.0] 0.0 not_funded
40489 0.0 1050.0 Cloth & Dressmaking Supplies Retail not_provided KE Kenya KES 14.0 0 female monthly short_term_loan (0.999, 25.0] 0.0 not_funded
53392 0.0 1025.0 Retail Retail not_provided NI Nicaragua USD 17.0 0 female monthly short_term_loan (0.999, 25.0] 0.0 not_funded
59399 0.0 325.0 Agriculture Agriculture not_provided SV El Salvador USD 13.0 0 female bullet short_term_loan (0.999, 25.0] 0.0 not_funded
61296 0.0 500.0 Farming Agriculture not_provided SV El Salvador USD 14.0 0 female irregular short_term_loan (0.999, 25.0] 0.0 not_funded
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
643472 0.0 425.0 Farming Agriculture not_provided UG Uganda UGX 12.0 0 female monthly short_term_loan (0.999, 25.0] 0.0 not_funded
646452 0.0 3000.0 Clothing Clothing not_provided US United States USD 24.0 0 female bullet short_term_loan (0.999, 25.0] 0.0 not_funded
647114 0.0 5000.0 Education provider Education not_provided US United States USD 18.0 0 female bullet short_term_loan (0.999, 25.0] 0.0 not_funded
671151 0.0 25.0 Livestock Agriculture not_provided KE Kenya KES 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.0 not_funded
671174 0.0 25.0 Games Entertainment not_provided KE Kenya KES 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.0 not_funded

115 rows × 16 columns

In [47]:
df.loc[(df.use == 'not_provided') ,:]
Out[47]:
funded_amount loan_amount activity sector use country_code country currency term_in_months lender_count borrower_genders repayment_interval term_cat term_label rate_of_financing(%) loan_status
140 2975.0 2975.0 Food Production/Sales Food not_provided TZ Tanzania TZS 10.0 110 female monthly short_term_loan (0.999, 25.0] 100.00 fully_funded
145 1200.0 1200.0 Personal Expenses Personal Use not_provided PE Peru PEN 20.0 44 female monthly short_term_loan (0.999, 25.0] 100.00 fully_funded
170 4250.0 4250.0 Catering Food not_provided TZ Tanzania TZS 10.0 116 female monthly short_term_loan (0.999, 25.0] 100.00 fully_funded
412 2350.0 2350.0 Beauty Salon Services not_provided TZ Tanzania TZS 10.0 75 female monthly short_term_loan (0.999, 25.0] 100.00 fully_funded
414 725.0 725.0 Agriculture Agriculture not_provided SV El Salvador USD 20.0 19 female monthly short_term_loan (0.999, 25.0] 100.00 fully_funded
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
659604 5625.0 10000.0 Weaving Arts not_provided BT Bhutan USD 14.0 210 female irregular short_term_loan (0.999, 25.0] 56.25 avg_funding
660788 1975.0 1975.0 Home Energy Personal Use not_provided PS Palestine USD 27.0 39 female monthly medium_term_loan (25.0, 61.0] 100.00 fully_funded
661718 800.0 1600.0 Furniture Making Manufacturing not_provided HT Haiti HTG 13.0 27 female irregular short_term_loan (0.999, 25.0] 50.00 avg_funding
671151 0.0 25.0 Livestock Agriculture not_provided KE Kenya KES 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.00 not_funded
671174 0.0 25.0 Games Entertainment not_provided KE Kenya KES 13.0 0 female monthly short_term_loan (0.999, 25.0] 0.00 not_funded

3899 rows × 16 columns

Note

Just about 3 % of of applications with no use were not funded.

In [48]:
#What sectors are fully_funded
df_7 = df.groupby('sector')['loan_status'].apply(lambda x: (x=='fully_funded').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_7.head(5)
Out[48]:
sector count
0 Agriculture 160999
6 Food 127300
11 Retail 112921
12 Services 40966
2 Clothing 29572
In [49]:
#what sectors are not funded
df_7 = df.groupby('sector')['loan_status'].apply(lambda x: (x=='not_funded').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_7.head(5)
Out[49]:
sector count
6 Food 682
11 Retail 617
0 Agriculture 593
12 Services 589
2 Clothing 202
In [50]:
#what countries have applications that were  not funded

df_7 = df.groupby('country')['loan_status'].apply(lambda x: (x=='not_funded').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_7.head(5)
Out[50]:
country count
80 United States 837
34 Kenya 526
54 Pakistan 242
22 El Salvador 223
11 Cambodia 170

Findings

From my above analysis the following can be noted

  • The rate at which applications are funded has no dependency on sectors ,as it is shared almost equally across the board
  • The rate at which applications are funded has no dependency on the country ,as it is shared almost equally across the board. Except for USA where we can assume that investors will rather not invest in USA as it is a country in the developed world and therefore not in need of much help.
  • About 80% of applications that were not originally applied for in English or that were fake were not funded . However this represents just a very small minority of applications that were not funded ,and therefore is not a significant factor.

Conclusion

  • Kiva has a very high Funding Rate
  • Investors will not invest if they do not understand the understand the language in which the application was written in.

Which countries had the most applications?¶

In [51]:
df_c  = df.groupby(by='country',as_index=False).agg(number_of_applications=('loan_status','size')).sort_values(['number_of_applications'],ascending=False)
df_c.head(5)
Out[51]:
country number_of_applications
59 Philippines 154323
34 Kenya 73554
22 El Salvador 39845
11 Cambodia 27067
54 Pakistan 26475
In [52]:
# PLOT 1 TO BE EVALUATED

fig = px.scatter_geo(df_c, locations="country",
                     size="number_of_applications",
                     locationmode = 'country names',
                     color= 'country',
                     size_max=30,
                     title=" Fig 2.0 : Distribution of KIVA Projects accross the world 2016"
                     
                     
                    )

fig.update_layout(plot_bgcolor="white")
            
fig.show()

Note

From our area map we see that most of our applications came from South and Latin America, Asia and Africa . Are applications in these regions sector specific? This brings me to my next question. What sector has the most funded in the top three countries.

In [53]:
#Philipines
df_4 = df.groupby('sector')['country'].apply(lambda x: (x=='Philippines').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_4.head()
Out[53]:
sector count
11 Retail 52510
6 Food 41864
0 Agriculture 35158
8 Housing 4697
13 Transportation 4547
In [54]:
#Kenya
df_5 = df.groupby('sector')['country'].apply(lambda x: (x=='Kenya').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_5.head()
Out[54]:
sector count
0 Agriculture 32883
6 Food 13763
11 Retail 10094
12 Services 5408
2 Clothing 4667
In [55]:
#El Salvador
df_6 = df.groupby('sector')['country'].apply(lambda x: (x=='El Salvador').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_6.head()
Out[55]:
sector count
0 Agriculture 14008
6 Food 8307
8 Housing 5965
11 Retail 5666
2 Clothing 1905

Conclusion

From the above tables, I notice that the countries with the highest numbers of applications all fall around the same the sectors. This is proof of what people in these developing areas are in most need of.

There are therefore no irregulaties here and will be moving on with my analysis.

What sectors have the most applications?¶

In [56]:
df_s  = df.groupby(by='sector',as_index=False).agg(applications=('sector','size')).sort_values(['applications'],ascending=False)
df_s
Out[56]:
sector applications
0 Agriculture 174624
6 Food 135359
11 Retail 123186
12 Services 44844
2 Clothing 32554
8 Housing 30155
4 Education 30128
10 Personal Use 25533
13 Transportation 15392
1 Arts 11938
7 Health 9190
3 Construction 6263
9 Manufacturing 6204
5 Entertainment 829
14 Wholesale 634
In [57]:
barplot4 = px.bar(data_frame=df_s,
                 x="sector",
                 y="applications",
                 color="sector",
                 title="Fig 3.0: Number of Applications per Sector 2016",
                 template= "plotly_dark",
                 labels={"applications":"Total number of applications"}
                 
                 )

barplot4.show()               

Note

From the above table it is no doubt that the top three sectors from where we have applications are :Agriculture,Food and Retail as these are the main sources of income and also a main need in the areas with the highest applications. To prove this let us take agriculture and see the top countries that need funding for agriculture.

In [58]:
#Agriculture
df_7 = df.groupby('country')['sector'].apply(lambda x: (x=='Agriculture').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_7
Out[58]:
country count
59 Philippines 35158
34 Kenya 32883
22 El Salvador 14008
11 Cambodia 10542
58 Peru 6585
... ... ...
50 Namibia 0
44 Mauritania 0
39 Lesotho 0
37 Lao People's Democratic Republic 0
0 Afghanistan 0

87 rows × 2 columns

Conclusion

We see our dataset till this point is relatively homogenous and is spread evenly accross the board.

What sector had the highest inflow of cash?¶

In [59]:
df_cf =  df.groupby(by='sector',as_index=False).agg({'funded_amount': np.sum}).sort_values(['funded_amount'],ascending=False)
df_cf.round(2)
Out[59]:
sector funded_amount
0 Agriculture 132669060.0
6 Food 114770835.0
11 Retail 90523555.0
12 Services 42879735.0
2 Clothing 34750195.0
4 Education 29701870.0
8 Housing 20191325.0
10 Personal Use 12692225.0
1 Arts 11778255.0
13 Transportation 9805775.0
7 Health 9159825.0
3 Construction 6320365.0
9 Manufacturing 5355900.0
5 Entertainment 1023095.0
14 Wholesale 918900.0
In [60]:
# PLOT 2 TO BE EVALUATED

import plotly.graph_objects as go
from plotly.offline import iplot

trace1  = go.Scatter(
        mode='lines+markers',
        x = df_cf['sector'],
        y = df_cf['funded_amount'],
        name="total loan per sector",
        marker_color='chocolate'
    )

trace2 = go.Bar(
        x = df_s['sector'],
        y = df_s['applications'],
        name="total applications per sector",
        yaxis='y2',
        marker_color ='darkcyan',
        marker_line_width=1.5,
        marker_line_color='rgb(8,48,107)',
        opacity=0.5
    )

data = [trace1, trace2]

layout = go.Layout(
    title_text='Fig 4: Applications per sector and total investment 2016',
    template= "simple_white",
    yaxis=dict(
        range = [900000, 140000000],
        side = 'right'
        
    ),
    yaxis2=dict(
        overlaying='y',
        anchor='y3',
    )
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='multiple-axes-double')

Note

From our results above, we can notice that there is no abnormaly here. The sectors with the highest number of applications received the highest funds in 2016.

In order to enquire further i want to find out what sector received the highest investment in one project.This will continue further to find out in what sector were lenders more predisposed to invest?

In [61]:
df_mx =  df.groupby(by='sector',as_index=False).agg({'funded_amount': np.max}).sort_values(['funded_amount'],ascending=False)
df_mx.round(2)
Out[61]:
sector funded_amount
0 Agriculture 100000.0
2 Clothing 50000.0
3 Construction 50000.0
4 Education 50000.0
7 Health 50000.0
9 Manufacturing 50000.0
11 Retail 50000.0
12 Services 50000.0
14 Wholesale 50000.0
1 Arts 40000.0
6 Food 35000.0
13 Transportation 35000.0
8 Housing 24450.0
10 Personal Use 13425.0
5 Entertainment 10000.0
In [62]:
#Is there any peculiarity with projects that cost 50,000 USD dollars and above?
df.loc[(df.funded_amount >= 50000) ,:]
Out[62]:
funded_amount loan_amount activity sector use country_code country currency term_in_months lender_count borrower_genders repayment_interval term_cat term_label rate_of_financing(%) loan_status
34196 50000.0 50000.0 Renewable Energy Products Retail to buy and sell Barefoot Power's Solar Lightin... PE Peru USD 14.0 1446 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
43182 50000.0 50000.0 Renewable Energy Products Retail To buy and sell Barefoot Power's solar lightin... KE Kenya USD 16.0 1491 male bullet short_term_loan (0.999, 25.0] 100.0 fully_funded
53634 50000.0 50000.0 Renewable Energy Products Retail To buy and sell Barefoot Power solar lighting. UG Uganda USD 14.0 1581 female bullet short_term_loan (0.999, 25.0] 100.0 fully_funded
70499 100000.0 100000.0 Agriculture Agriculture create more than 300 jobs for women and farmer... HT Haiti USD 75.0 2986 female irregular long_term_loan (61.0, 121.0] 100.0 fully_funded
126839 50000.0 50000.0 Agriculture Agriculture to buy and plant resin producing pine trees. T... MX Mexico USD 144.0 586 male, male, male, male, male, male, male, female irregular extra_long_term_loan (121.0, 158.158] 100.0 fully_funded
163727 50000.0 50000.0 Agriculture Agriculture to fund its growing loan book and further deve... KE Kenya USD 38.0 1343 female bullet medium_term_loan (25.0, 61.0] 100.0 fully_funded
210975 50000.0 50000.0 Agriculture Agriculture To work with 17 farming cooperatives to proces... RW Rwanda USD 8.0 1302 male bullet short_term_loan (0.999, 25.0] 100.0 fully_funded
223120 50000.0 50000.0 Higher education costs Education to provide loans and career services for the l... MX Mexico USD 14.0 960 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
408295 50000.0 50000.0 Clothing Clothing to set up a garment social business that will ... AL Albania USD 97.0 1626 male monthly long_term_loan (61.0, 121.0] 100.0 fully_funded
408465 50000.0 50000.0 Construction Construction not_provided PE Peru USD 73.0 1310 female irregular long_term_loan (61.0, 121.0] 100.0 fully_funded
447374 50000.0 50000.0 Agriculture Agriculture to increase smallholder farmers’ incomes by bu... UG Uganda USD 13.0 1555 male monthly short_term_loan (0.999, 25.0] 100.0 fully_funded
490191 50000.0 50000.0 Health Health To purchase raw materials in order to produce ... GH Ghana USD 14.0 1569 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
492809 50000.0 50000.0 Agriculture Agriculture to expand weather, farming information and fin... GH Ghana USD 14.0 1481 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
494470 50000.0 50000.0 Agriculture Agriculture To pay smallholder coffee farmers in rural Ken... KE Kenya USD 15.0 1441 female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
496715 50000.0 50000.0 Agriculture Agriculture to fund the harvest of seeds of 6,000 smallhol... MG Madagascar USD 22.0 1606 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
509048 50000.0 50000.0 Poultry Agriculture to purchase chicken feed & a delivery vehicle ... TZ Tanzania USD 14.0 1765 female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
523634 50000.0 50000.0 Health Health to mitigate CO2 & household air pollution, whi... MW Malawi USD 24.0 1465 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
523659 50000.0 50000.0 Electronics Sales Retail to train & equip 200 rural merchants in Mozamb... MZ Mozambique USD 14.0 1410 female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
526100 50000.0 50000.0 Renewable Energy Products Retail to distribute 200+ innovative & affordable pay... ZM Zambia USD 14.0 1550 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
538248 50000.0 50000.0 Agriculture Agriculture to enable 5,000 additional small-holder farmer... KE Kenya USD 14.0 1589 male monthly short_term_loan (0.999, 25.0] 100.0 fully_funded
541006 50000.0 50000.0 Goods Distribution Wholesale to bolster logistics of affordable water distr... HT Haiti USD 14.0 1349 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
544548 50000.0 50000.0 Health Health to provide community trauma services in South ... SS South Sudan USD 8.0 1609 female bullet short_term_loan (0.999, 25.0] 100.0 fully_funded
548513 50000.0 50000.0 Renewable Energy Products Retail to distribute solar home systems throughout ru... ZW Zimbabwe USD 13.0 720 male monthly short_term_loan (0.999, 25.0] 100.0 fully_funded
563074 50000.0 50000.0 Renewable Energy Products Retail to provide life-changing clean cookstoves and ... KE Kenya USD 14.0 1402 female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
565733 50000.0 50000.0 Agriculture Agriculture to pay 600 farming families 100% above market ... EC Ecuador USD 14.0 1689 female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
583307 50000.0 50000.0 Agriculture Agriculture to support 800+ farmers by improving their pro... GT Guatemala USD 20.0 1671 male monthly short_term_loan (0.999, 25.0] 100.0 fully_funded
586970 50000.0 50000.0 Renewable Energy Products Retail to generate income to over 600 fishermen in Ta... TZ Tanzania USD 14.0 782 female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
604502 50000.0 50000.0 Agriculture Agriculture to add value and jobs to the local economy by ... BJ Benin USD 12.0 1519 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
614869 50000.0 50000.0 Furniture Making Manufacturing create jobs through environmentally-friendly m... KE Kenya USD 20.0 1094 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
614922 50000.0 50000.0 Water Distribution Services to set up 13 new clean water businesses in nor... GH Ghana USD 14.0 1688 female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
618264 50000.0 50000.0 Farming Agriculture to provide income opportunities in remote Indo... ID Indonesia USD 14.0 1564 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
621860 50000.0 50000.0 Renewable Energy Products Retail to distribute 400 pay-as-you-go solar home sys... KE Kenya USD 14.0 1170 male irregular short_term_loan (0.999, 25.0] 100.0 fully_funded
631904 50000.0 50000.0 Agriculture Agriculture double cashew nut export output and hire about... CI Cote D'Ivoire USD 10.0 1706 female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded

Note

From the tables above we see that the biggest project was worth 100,000 USD and in the Agricultural sector. Considerably so,as it was funded towards a big project with the creation about 300 jobs. I will therefore not consider this as an outlier but as an extreme investment.

When i look further down the list and compare the number of lenders per investment per sector, it brings me to an investor analysis and i would like to dig deeper.

Lender/Investor Behavior towards projects vis a vis sectors and Countries¶

In this section,my business question will be:

How much are lenders willing to invest? What is the propensity to invest?

It will be good for Kiva Organization to know what sectors its investors will be willing to put in more money. The Kiva sectores mandates a minimum of 25 USD investment per person. But how much further are investors willing to invest above this amount?

Let us note that this analysis will be based on how much investors are willing to invest per person per sector and not as a group.

Therefore to get an answer to my question, i will create a new column that shows how much money a lender contributed per project. That way i will be able to know averagely how much an investor is willing to pay for a project in a particular sector or in a particular country.

In [63]:
#New column "inv_per_lender"

df["inv_per_lender"]=round((df['funded_amount']/df['lender_count']),2)

df["inv_per_lender"].fillna(0, inplace = True)
df.head(5)
Out[63]:
funded_amount loan_amount activity sector use country_code country currency term_in_months lender_count borrower_genders repayment_interval term_cat term_label rate_of_financing(%) loan_status inv_per_lender
0 300.0 300.0 Fruits & Vegetables Food To buy seasonal, fresh fruits to sell. PK Pakistan PKR 12.0 12 female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded 25.00
1 575.0 575.0 Rickshaw Transportation to repair and maintain the auto rickshaw used ... PK Pakistan PKR 11.0 14 female, female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded 41.07
2 150.0 150.0 Transportation Transportation To repair their old cycle-van and buy another ... IN India INR 43.0 6 female bullet medium_term_loan (25.0, 61.0] 100.0 fully_funded 25.00
3 200.0 200.0 Embroidery Arts to purchase an embroidery machine and a variet... PK Pakistan PKR 11.0 8 female irregular short_term_loan (0.999, 25.0] 100.0 fully_funded 25.00
4 400.0 400.0 Milk Sales Food to purchase one buffalo. PK Pakistan PKR 14.0 16 female monthly short_term_loan (0.999, 25.0] 100.0 fully_funded 25.00
In [64]:
df_is = df.groupby(by='sector',as_index=False).agg(applications_2016=('country', 'size'), funded_2016=('funded_amount', 'sum'), inv_per_lender=('inv_per_lender','mean') )                                                  
df_is                                                 
                                                     
Out[64]:
sector applications_2016 funded_2016 inv_per_lender
0 Agriculture 174624 132669060.0 47.569026
1 Arts 11938 11778255.0 60.762197
2 Clothing 32554 34750195.0 78.995450
3 Construction 6263 6320365.0 56.376962
4 Education 30128 29701870.0 39.877026
5 Entertainment 829 1023095.0 48.978299
6 Food 135359 114770835.0 77.002489
7 Health 9190 9159825.0 51.929370
8 Housing 30155 20191325.0 39.647853
9 Manufacturing 6204 5355900.0 52.163453
10 Personal Use 25533 12692225.0 42.697106
11 Retail 123186 90523555.0 77.134103
12 Services 44844 42879735.0 62.618802
13 Transportation 15392 9805775.0 53.756249
14 Wholesale 634 918900.0 46.627666

Note

This will be better analysed with the use of a scatter plot.On this plot, the position of a sector will show us the number of applications and the total amount funded in 2016 for that sector. Furtehr more it will show us the sector where lenders are more incited to pay for an investment.

In [65]:
# PLOT 3 TO BE EVALUATED
import plotly.express as px
fig = px.scatter(data_frame=df_is, 
                 x="applications_2016",
                 y="funded_2016", 
                 color='sector',
                 size= "inv_per_lender",
                 size_max= 30,
                 title= " Fig 5.0 :Propensity to invest per sector KIVA 2016",
                 template="plotly_white",
                 labels={"funded_2016":"total invested in 2016", 'applications_2016':"total projects in 2016"}
                 
                 )
fig.show()

Note

This scatter plot shows the total amount of money invested per sector on the KIVA platform in 2016.

The balls represent the sectors and the shape represent how much an investor is willing to pay for a project in that sector.

The sectors with a higher propensity to invest are Clothing, Retail and food. Is there a reason why Investors will invest more in this sectors? Let's find out.

In [66]:
#Question One

# What is the repayment structure in these sectors (Clothing,food,and retail)? This would help further confirm our analysis
#We want to know why the propensity to invest in higher in this sectors.

df_rs = df.groupby(by=['repayment_interval',"sector"],as_index=False).agg(applications_2016=('sector', 'count'))                                                 
df_rs.head(60)


#For lack of time the following calculations were done on paper

#Bullet Form repayment

#agriculture 23%
#clothing 3.1%
#Retail 3.9%
#Food 3.3%
Out[66]:
repayment_interval sector applications_2016
0 bullet Agriculture 41050
1 bullet Arts 1454
2 bullet Clothing 1037
3 bullet Construction 293
4 bullet Education 2156
5 bullet Entertainment 155
6 bullet Food 4583
7 bullet Health 138
8 bullet Housing 875
9 bullet Manufacturing 451
10 bullet Personal Use 3134
11 bullet Retail 4896
12 bullet Services 5052
13 bullet Transportation 406
14 bullet Wholesale 56
15 irregular Agriculture 45716
16 irregular Arts 4605
17 irregular Clothing 13447
18 irregular Construction 1547
19 irregular Education 4288
20 irregular Entertainment 226
21 irregular Food 74882
22 irregular Health 1654
23 irregular Housing 6285
24 irregular Manufacturing 2436
25 irregular Personal Use 3522
26 irregular Retail 72486
27 irregular Services 12433
28 irregular Transportation 6360
29 irregular Wholesale 164
30 monthly Agriculture 87819
31 monthly Arts 5864
32 monthly Clothing 17936
33 monthly Construction 4412
34 monthly Education 23684
35 monthly Entertainment 448
36 monthly Food 55797
37 monthly Health 7385
38 monthly Housing 22995
39 monthly Manufacturing 3317
40 monthly Personal Use 18877
41 monthly Retail 45680
42 monthly Services 27206
43 monthly Transportation 8614
44 monthly Wholesale 414
45 weekly Agriculture 39
46 weekly Arts 15
47 weekly Clothing 134
48 weekly Construction 11
49 weekly Food 97
50 weekly Health 13
51 weekly Retail 124
52 weekly Services 153
53 weekly Transportation 12

Note

For our three sectors of interest , they are have a lower bullet repayment rate of less than 4% as opposed to Agriculture with a bullet repayment rate of 23%.

Are there any other reason why Investors will be likely to invest in the above three sectors? Our second and third questions will be.

  • Question two: How long is the duration of loans in these sectors?

  • Question three: What activities make up the biggest part of these sectors?

Question Two- How long is the duration of loans in these sectors?

To note:

  • Short term loan 0-2 years
  • Medium Term 2-5years
  • Long term 5-10years
  • Extra long term 10 years and above
In [67]:
#Agriculture

df_ag = df.groupby('term_cat')['sector'].apply(lambda x: (x=='Agriculture').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_ag


#For lack of time ,the following calculations were done on paper

# Short term loan 90%
#Medium term loan 9.6%
#long term 0.15%
Out[67]:
term_cat count
0 short_term_loan 157336
1 medium_term_loan 16933
2 long_term_loan 277
3 extra_long_term_loan 78
In [68]:
#Clothing

df_ag = df.groupby('term_cat')['sector'].apply(lambda x: (x=='Clothing').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_ag

#For lack of time ,the following calculations were done on paper

# Short term loan 94%
#Medium term loan 5.4%
#long term 0.06%
#extra long term 0
Out[68]:
term_cat count
0 short_term_loan 30772
1 medium_term_loan 1760
2 long_term_loan 22
3 extra_long_term_loan 0
In [69]:
#Retail

df_ag = df.groupby('term_cat')['sector'].apply(lambda x: (x=='Retail').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_ag

#For lack of time ,the following calculations were done on paper

# Short term loan 96%
#Medium term loan 3.2%
#long term 0.005%
#extra long term 0
Out[69]:
term_cat count
0 short_term_loan 119212
1 medium_term_loan 3967
2 long_term_loan 7
3 extra_long_term_loan 0

Question 3: What activities make up the biggest part of these sectors?

In [70]:
#Clothing 

df_ac = df_1.groupby('activity')['sector'].apply(lambda x: (x=='Clothing').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_ac.head(5)

fig = px.pie(df_ac.head(5), 
             values='count', 
             names='activity',
             title= "Fig 6.1: Activity distributed across the Clothing Sector"
            )


fig.update_traces(textposition='inside')
fig.update_layout(
    uniformtext_minsize=10, uniformtext_mode='hide')
    
    
fig.show()
In [71]:
#Retail 

df_ar = df.groupby('activity')['sector'].apply(lambda x: (x=='Retail').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_ar

fig = px.pie(df_ar.head(34), 
             values='count', 
             names='activity',
             title= "fig 6.2 :Activity distributed across the Retail Sector"
            )


fig.update_traces(textposition='inside')
fig.update_layout(
    uniformtext_minsize=10, uniformtext_mode='hide')
    
    
fig.show()
In [72]:
#Food 

df_af = df.groupby('activity')['sector'].apply(lambda x: (x=='Food').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_af.head(20)

fig = px.pie(df_af.head(20), 
             values='count', 
             names='activity',
             title= "fig 6.3: Activity distributed across the Food Sector"
            )


fig.update_traces(textposition='inside')
fig.update_layout(
 uniformtext_minsize=10, uniformtext_mode='hide')
    
    
fig.show()

FINAL RESULTS FROM OUR EDA¶

From our scatter plot (Fig 5.0) there are few things which are by now not strange to us. We can clearly see the number of applications per sectors and the total amount funded, but what we can also see is the size of our balls. The size show us Investors' readiness to invest in a particular sector.

As we have seen above: Agriculture,Food and Retail have the highest loan applications and a corresponding high amount of funding. However the clothing ,Food and Retail sector are more attractive to investors than the agricultural sector.

Going into the details; an investor is willing to invest 78 USD per project in the clothing sector and 77 USD on projects in both the Retail and Food Sectors as opposed to 47 USD in the Agricultural sector.

The question here is why?

  • Repayment Interval

The repayment interval refers to how borrowers will pay back the loan granted to them. Will it be monthly,weekly,irregularly or in bullet form. Bullet refers to a form of payment where only the interest is paid back during the lifespan of the loan and theh principal will beb repaid as a lump sum at the end. Most investors do not like this type of investment especially if it comes with a low interest rate as is with th ecase with the principles of the KIVA organization: Loans to impevorished regions at low rates.

The clothing, retail and food sector have a bullet repayment rate of less than 4% as opposed to a 23% bullet repayment rate in the agricultural sector.

It is therefore of no wonder why investors will feel safer to invest in this sectors.

  • Profitability and Level of risk

With every loan, there is always a risk that the lender will not be able to refund a loan. In this case the Kiva Organisation does not guarrantee a refund and always makes sure to let investor know about this risk.

In general we all know the risk associated with the agriculture business. It can be affected by natural disasters and diseases in the case of animal farming . In 2016 as we can see in fig 6.1 Clothing sales made up 68% of the total projects in the clothing sector. Therefore a huge part of investment in this sector was made up of clothes buy and resale. The world clohing business is one of the biggest and safest industry in the world,and naturally investors will feel safe investing.

In fig 6.2 General store make up about 50% of the retail sector. This is safe investment as the goods sold in a general store are mostly non perishable and investors are assured of their return on investments.

In fig 6.3 Food production and sales made up 20% of the projects in this sector and in general the activities in this sector are mostly for the sale of food. Food being a need and a basic commodity, there is a lower risk of business failure.

  • Duration of Loan

The duration of loan in these sectors were also analysed and from our discovery we can see that in general the Clothinga nd Retailsectors have relatively shorter duration than the Agricultural sector. Every investor would want to invest in a sector where the payback time is not long. In th eretail sector for example, short term loan make upü 96% of the total number of projects . it also has a very low rate of long term loans at 0.005%,and none of the projects in this sector last more than 10years. The same can be said with the clothing and food sector who follow closely behind.

However in the case of the Agricultural sector, 90% of its loans were short term ,9.6 % mid term,0.15% long term and 78 of it sprojects spanned more than 10 years.

We can see that in general the repayment duration in the retail,food and clothing sectors are relatively better off than that in the agricultural sectors.

Conclusion on EDA

To conclude, an investor investing in projects through the Kiva organization will be more willing to spend more than the bare minimum which is 25 USD in the Clothing, Retail and Food industries because of higher return rate and better repayment modalities.

On the other hand i am also aware that although this is a general view accross the board, it will be interesting to see how this dynamics change from one country to another. It is on this basis that my dashboard will be created .

2. Teil - Dashboard¶

Basierend auf den vorangegangenen Daten soll ein Dashboard erstellt werden.

Das Dashboard soll:

                - mindestens 1 HTML component enthalten
                - 1 Dash Core Component mit dem Daten ausgewählt werden können
                - 1 Grafik, die sich in Abhängigkeit von der Datenauswahl ändert

Verwende folgenden Code zum Ausführen der App:

if name == 'main':

my_app.run_server(mode='inline', port=8091)
In [73]:
df_csi = df.groupby(by=['country','sector'],as_index=False).agg(applications_2016=('sector', 'count'), funded_2016=('funded_amount', 'sum'), inv_per_lender=('inv_per_lender','mean') )                                                  
df_csi 
Out[73]:
country sector applications_2016 funded_2016 inv_per_lender
0 Afghanistan Arts 2 14000.0 27.690000
1 Albania Agriculture 719 976925.0 31.333060
2 Albania Arts 5 8375.0 29.556000
3 Albania Clothing 87 153925.0 31.759770
4 Albania Construction 30 35325.0 31.784333
... ... ... ... ... ...
985 Zimbabwe Personal Use 1 1400.0 53.850000
986 Zimbabwe Retail 765 764500.0 39.301399
987 Zimbabwe Services 230 209875.0 39.358391
988 Zimbabwe Transportation 9 11550.0 29.873333
989 Zimbabwe Wholesale 16 22300.0 33.176875

990 rows × 5 columns

Note

Before we go to our dropdown let us see a view of our data in Kenya.

In [74]:
import plotly.express as px
fig = px.scatter(data_frame=df_csi.loc[df_csi.country=="Kenya",:], 
                 x="applications_2016",
                 y="funded_2016", 
                 color='sector',
                 size= "inv_per_lender",
                 size_max= 30,
                 title= "Fig 5.1 : Propensity to invest per sector in Kenya 2016",
                 template="plotly_white",
                 labels={"funded_2016":"total invested in 2016", 'applications_2016':"total projects in 2016"}
                 
                 )
fig.show()

Note

From the above plot we can see that in 2016 in Kenya, Agriculture was theh sector with the highest number of applied projects (32,000) and a total funding of 16,2 million USD with an investor paying approximately 38 USD per project.

The food sector comes far behind with a total number of projects at 13,000 ,total funding at 4.2 million, meanwhile investors were willing to invest 43 USD per project.

In [75]:
from jupyter_dash import JupyterDash
from dash import dcc


from dash import html


from dash.dependencies import Input, Output
In [76]:
# Dashboard Creation


external_stylesheet = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

my_kiva_app = JupyterDash(__name__, external_stylesheets=external_stylesheet)

my_kiva_app.layout = html.Div([
                 
                               html.H1("Kiva Country,Sector and Investor Analysis 2016",
                                      style={"text-align":"center"}
                                      ),
                       
                                html.Div([
                                      
                                      html.Label("Choose country:",
                                                style={'font-weight': 'bold'}),
                                     
    
                                    html.Br(), 
    
    
    
                                dcc.Dropdown(id="dd_kiva_plot",
                                            options=df_csi.country.unique(),
                                            style={ 'width': '250px'},
                                            
                                            )
                                             ]),
    
    
                               dcc.Graph(id="g_kiva_scatter",
                                        figure={},
                                        className="ten columns"
                                        ),
    
                              dcc.Graph(id="g_kiva_bar",
                                        figure={},
                                        className="ten columns"
                                        )
                            

                            ])

@my_kiva_app.callback( 
                       Output("g_kiva_scatter","figure"),
                       Output("g_kiva_bar","figure"),
    
                       
                      Input("dd_kiva_plot","value")
    
    
                      )

def update_kiva_figure(selected_country):
    
    
    df_sel= df_csi.loc[df_csi["country"]==selected_country,:]
    df_sel2=df_csi.loc[df_csi["country"]==selected_country,:]
    
    
    
    g_kiva_scatter=px.scatter(data_frame=df_sel,
                     x="applications_2016",
                     y="funded_2016",
                     color='sector',
                     size= "inv_per_lender",
                     size_max= 30,         
                     title="Investor behavior towards project vis a vis sector and country",
                     template="plotly_dark",
                     labels={"funded_2016":"total invested in 2016", 'applications_2016':"total projects in 2016"}
                      
                     )
    
    g_kiva_bar=px.bar(data_frame=df_sel,
                     x="sector",
                     y="applications_2016",
                     color='sector',
                     title="Applications per country",
                     template= "plotly_dark",
                     labels={"applications_2016":"Total projects 2016"}
                      
                     )
    
    
    
    return g_kiva_scatter,g_kiva_bar



#if __name__=="__main__":
 #   my_kiva_app.run_server(mode="external", port=8091)
    
    
if __name__=="__main__":

    my_kiva_app.run_server(mode='inline', port=8103)

Note

The above dashboard has two graphs. One of them shows us Investors' propensity to invest per sector when a country is chosen. It also shows us the total amount of money that was invested in a given country and in a given sector in 2016.

and the second gives a clearer vision of how many applications KIVA received per countries .

This Dashboard gives a somewhat overall view of the business in 2016.

Conclusion¶

In [77]:
#Business Figures

#Number of Contributions

df_1.lender_count.sum()
Out[77]:
13655103
In [79]:
#Number of Applications
len(df_1.index)
Out[79]:
646833

More than 1.7 billion people around the world are unbanked and can’t access the financial services they need. Kiva is an international nonprofit, founded in 2005 in San Francisco, with a mission to expand financial access to help underserved communities thrive.

The goal of Kiva organisation is in their own words "We envision a financially inclusive world where all people hold the power to improve their lives."

They do this by crowdfunding loans and unlocking capital for the underserved, improving the quality and cost of financial services, and addressing the underlying barriers to financial access around the world.

In 2016 Kiva recived 646,833 applications from accros the globe with the most applications coming from :Asia,South and Lating America and Africa.They were able to finance these projects with the help 13,655,103 contributions at a funding rate of 92%.

Having now arrived at the end of our Analysis we can conclude the following:

  • KIVA has a very good funding rate
  • Most countries have a need in the areas of Agriculture,Food,Retail and Clothing.
  • Investors are more likely to invest in sectors where the risk of default is low
  • Kiva projects have a high rate of "irregular repayment interval" going up to about 50% for sectors like retail.
  • Kiva has a very low rate of fraudulent applications

All in all the KIVA is a very good platform for those who are seeking to invest and those who are in need of funds ,and this is proof that there are still good people in the world.

In [80]:
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

Important Notice

This Data set was gotten from teh KIVA website.

In [ ]: